On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation
نویسندگان
چکیده
We investigate the relationship between three fundamental problems in machinelearning: binary classification, bipartite ranking, and binary class probability esti-mation (CPE). It is known that a good binary CPE model can be used to obtain agood binary classification model (by thresholding at 0.5), and also to obtain a goodbipartite ranking model (by using the CPE model directly as a ranking model); itis also known that a binary classification model does not necessarily yield a CPEmodel. However, not much is known about other directions. Formally, these rela-tionships involve regret transfer bounds. In this paper, we introduce the notion ofweak regret transfer bounds, where the mapping needed to transform a model fromone problem to another depends on the underlying probability distribution (and inpractice, must be estimated from data). We then show that, in this weaker sense, agood bipartite ranking model can be used to construct a good classification model(by thresholding at a suitable point), and more surprisingly, also to construct agood binary CPE model (by calibrating the scores of the ranking model).
منابع مشابه
Bayes-Optimal Scorers for Bipartite Ranking
We address the following seemingly simple question: what is the Bayes-optimal scorer for a bipartite ranking risk? The answer to this question helps elucidate the relationship between bipartite ranking and other established learning problems. We show that the answer is non-trivial in general, but may be easily determined for certain special cases using the theory of proper losses. Our analysis ...
متن کاملFeature-based Malicious URL and Attack Type Detection Using Multi-class Classification
Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...
متن کاملActive Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking
Bipartite ranking is a fundamental ranking problem that learns to order relevant instances ahead of irrelevant ones. One major approach for bipartite ranking, called the pair-wise approach, tackles an equivalent binary classification problem of whether one instance out of a pair of instances should be ranked higher than the other. Nevertheless, the number of instance pairs constructed from the ...
متن کاملBipartite ranking: risk, optimality, and equivalences
We present a systematic study of the bipartite ranking problem, with the aim of delineating its connections to the class-probability estimation problem. Our study focuses on the properties of the statistical risk for bipartite ranking, which is closely related to the area under the ROC curve: we establish alternate representations of the risk, relate the Bayes-optimal risk to a class of probabi...
متن کاملOn classification, ranking, and probability estimation
Given a binary classification task, a ranker is an algorithm that can sort a set of instances from highest to lowest expectation that the instance is positive. In contrast to a classifier, a ranker does not output class predictions – although it can be turned into a classifier with help of an additional procedure to split the ranked list into two. A straightforward way to compute rankings is to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013